*************************************************************************
* Myricom MX networking software and documentation                      *
* Copyright (c) 2006 by Myricom, Inc.                                   *
* All rights reserved.  See the file `COPYING' for copyright notice.    *
*************************************************************************

README of MX

MX, or Myrinet Express, is a low-level communication layer for Myrinet.

Table of Contents:
   I. Directory structure of MX distribution
  II. Installation
      1. Configuring and compiling MX.
      2. Installing the MX mcp and driver.
         a.  Load-time Tunable Parameters
         b.  Run-Time Tunable Parameters
         c.  Modifying MX library behavior at Run-time
      3. Enabling IP connectivity (OPTIONAL)
 III. MX Tool/Utility Functions and Test Programs
  IV. MX Performance
   V. Caveats
      a. Write-combining on i386 and x86_64 hosts

=============================================
I. Directory Structure of the MX distribution
=============================================

mx
|-- common
|-- doc                            MX API Documentation
|-- driver                         MX Kernel Drivers
|   |-- common
|   |-- freebsd
|   |-- linux
|   |-- macosx
|   |-- solaris
|   `-- windows
|-- libmyriexpress                 User-level MX API
|-- libmyriexpresstcp              User-level MX API simulated over tcp
|-- mapper-2xp                     MX Mapper
|-- mcp                            MX Myrinet Control Program (MCP)
|-- tests                          MX Test Suite
|   `-- interp                     mx_interp tests
`-- unit_test

================
II. Installation
================

MX is supported on the following operating systems and processors:

        Linux 2.6 and Linux 2.4 for i386, ia64, x86_64 (including AMD64 
          and EM64T), ppc, ppc64 (including Power4 and Power5).
	Solaris 10 for SPARC and AMD64.
        FreeBSD 5.x for i386, AMD64.
	MacOSX 10.3 and 10.4 for G5 and Intel.

MX-2G Supported NICs: PCIXD, PCIXE, PCIXF

MX-10G Supported NICs: 10G-PCIE-8A-{C,R,Q}

MX-2G can be used only in Myrinet mode, with the NICs connected to
a Myrinet-2000 switch.

MX installation is performed in the following three steps.

1. Configuring and compiling MX:
--------------------------------
        cd $MX_HOME
        configure  
        make

        By default, we assume that the header and config files of your Linux 
	kernel (required to compile outside modules and either part a of 
	kernel-headers or kernel-source package depending on your distribution)
	are pointed by /lib/modules/`uname -r`/{source,build}. 
	If your Linux installation is not standard, or you are cross-compiling 
	for a kernel different from the one of the compile node you
	must configure with the following option:

        ./configure --with-linux=<linux-source-dir>

        where <linux-src-dir> specifies the directory for the linux
        kernel source. The kernel header files MUST match the running kernel
        exactly: not only should they both be from the same version, but they
        should also contain the same kernel configuration options.

	For 2.6 kernels, the kernel headers/scripts often come in two parts in
        two different directories, you might need to use both --with-linux and
        --with-linux-build.
	For instance to select a specific kernel, you might need something like:
           ./configure --with-linux=/usr/src/linux-2.6.5-7.151/ \
               --with-linux-build=/usr/src/linux-2.6.5-7.151-obj/x86_64/smp/

	Additional configure options are available:
		--enable-32b		enable 32-bit library
  		--enable-64b		enable 64-bit library
 		--enable-kernel-lib	enable kernel library
 		--prefix=<dir>		install directory (default /opt/mx)
 		--disable-sse2          when using i386 processors without sse2
					(like P3, 32-bit-only Athlon and below)
		--disable-fms           use the old mapper implementation

2. Installing the MX mcp and driver:
------------------------------------

        Select an installation directory path <install_path>. It is
        usually best for <install_path> to be the path to an NFS
        directory available on all machines that will share this MX
        installation. The directory must be accessible using
        <install_path> on all machines that will share the
        installation. <install_path> must be an absolute path; it
        must start with "/". However, <install_path> may contain
        symbolic links.

        make install prefix=<install_path>

        If you omit prefix=<install_path>, the mcp and driver will be 
        installed in directory specified with the configure "--prefix" option,
        or the default directory, /opt/mx/. The MX binaries are 
        located in <install_path>/bin and <install_path>/sbin. The 32-bit
        MX libraries are installed in <install_path>/lib32 and the 64-bit MX
        libraries are installed in <install_path>/lib64.  The 
        <install_path>/lib directory is a symbolic link to either lib64 or
        lib32 depending on the native wordsize detected by configure.
        E.g., on most ppc64 distributions, gcc defaults to 32-bit, which
        means that lib links to lib32.  However, on most x86_64 distributions,
        gcc defaults to 64-bit, so lib links to lib64.

        Unless specified on the configure line, MX builds 32-bit libraries 
        on 32-bit architectures (i386, ppc) and 64-bit libraries on 64-bit 
        architectures (ia64, AMD64, ppc64, Alpha). It is possible to build 
        both by using the '--enable-32b' and '--enable-64b' configure flags.

        For Mac OS X, when the Apple Xcode compiler is 64-bit, MX fat 
        libraries are built which are usable by both 32-bit and 
        64-bit applications, and libraries are always installed in
        <install_path>/lib.

        For Linux, FreeBSD and Solaris, add the MX library directory to the 
        system library search path. Otherwise, individual users will have to  
        either manage their LD_LIBRARY_PATH(_64) environment variable or link 
        their program with an "-rpath/-R" option for the dynamic linker to
        locate the MX shared library.

        Next, you must run

        su root
        <install_path>/sbin/mx_local_install

        on each machine to perform local install steps such as:

          * Linux:   create the devices (/dev/mx* and /dev/mxp*), one device
		     per NIC.
	             install the init script in /etc/init.d if applicable.
	  * FreeBSD: update the devd configuration.
		     install the init script in /etc/init.d if applicable.
          * MacOSX:  install the module in the load directory.
          * Solaris: update /etc/devlink.tab.
		     install the init script in /etc/init.d if applicable.

	Last, you must run 

	su root
	<install_path>/sbin/mx_start_stop start

	on each machine to load the modules and, if used in Myrinet mode,
        this script will start a mapper for each Myrinet NIC contained in
        the machine. If applicable, the mx_start_stop script is also 
        available in /etc/init.d/mx. Available flags are:

	  - start:        unload GM if needed and load MX, the mapper 
		          is started automatically.
	  - stop:         stop the mapper and unload MX.
	  - start-mapper: start the mapper manually (for Myrinet mode,
                          experts only).
	  - stop-mapper:  stop the mapper manually (for Myrinet mode,
                          experts only).
	  - status:       indicate if MX is loaded.
	  - restart:	  stop the mapper, unload MX and reload MX.

        Note:  The MX software provides a separate kernel module
               for the mcp (firmware) and the driver.  If you do not
               use the mx_start_stop script to load the MX drivers and
               start the MX mapper, you must ensure that the mcp module
               is loaded first, as the driver module depends on it.

   a. Load-time Tunable Parameters:
   -------------------------------

	The MX driver and mcp contain a number of tunable parameters
	which may be adjusted by the customer when the MX driver is
	loaded.  (Note that we do NOT recommend for the customer to 
        modify driver parameters unless he is confident in what
        he is doing.)  These parameters are set in an OS-dependent
	manner.

        On Linux, there are three possibilities:

            insmod <install_path>/sbin/mx_driver.o mx_PARAM=VALUE

            Or, you can pass driver parameters in the MX_MODULE_PARAMS 
            env variable like:

            env MX_MODULE_PARAMS="mx_PARAM=VALUE" mx_start_stop start  

	    Or, you may pass parameters after the action on the
	    command line like:

	    mx_start_stop start mx_PARAM=VALUE  ...

	    On Linux 2.6, some parameters may also be changed after loading
	    by writing into /sys/module/mx_driver/parameters/mx_PARAM.
	   
        On FreeBSD:
	    kenv mx.PARAM=VALUE
	    kldload <install_path>/sbin/mx.ko

        On MacOSX:
            Find old value of boot args:
              nvram -p | grep boot-args
            Append desired parameter to boot-args:
              nvram 'boot-args=$OLD_BOOT_ARGS mx.PARAM=VALUE'
            Reboot:
              shutdown -r now

        On Solaris:
	    Edit /kernel/drv/mx_driver.conf, and set mx_PARAM=VALUE;

	The current tunable parameters include:
	    (remember that 'mx_PARAM=VALUE' must be replaced with
	     'mx.PARAM=VALUE' on FreeBSD and MacOSX)

	* mx_debug_mask:	default = 0
		This is a bitmap of debug messages to be printed
		when the driver is configured with --enable-debug.
		See common/mx_debug.h

	* mx_max_instance:	default = 4
	        This is the maximum number of Myrinet NICs the
		driver can support.

	* mx_max_endpoints:	default = 4
		This is the maximum number of endpoints per Myrinet NIC.

	* mx_max_nodes:		default = 1024
		This is the maximum number of remote nodes supported.

	* mx_max_send_handles:	default = 32
		This is the maximum number of simultaneous sends in the NIC.

	* mx_mapper_path:	default = /opt/mx/sbin/mx_start_mapper
		String specifies the path to a script which
		is run by the driver to start the mapper.
		This is used internally by the mx_start_stop script.

	* mx_ether_rx_frags:	default = 0
		When enabled, the Linux ethernet driver will use fragmented
		skbufs to receive big ethernet frames.  This is intended
		to work around problems allocating jumbo frames on
		machines with little free memory.

	* mx_small_message_threshold:	default = 128
		Size (in bytes) below which messages will be written
		with PIO on the sending side.

	* mx_medium_message_threshold:	default = 32768
		Size (in bytes) below which messages will be copied
		prior to transmission, rather than being DMA'ed
		directly from their current location.

	* mx_override_e_to_f:	default = 0
		When enabled, the driver treats PCIXE cards as if
		they were PCIXF cards, and disables the second port.
		This is primarily for testing and development.

	* mx_security_disabled:	default = 0
		If security is disabled, unprivileged users
		may do things like clear the mcp counters, and examine
		the Lanai sram.

	* mx_msi:		default = 0
		When enabled, and when kernel support is present,
		the Linux driver will use Message Signaled Interrupts,
		rather than legacy PCI interrupts.

	* mx_intr_coal_delay:	default = 10
		This is the maximum delay before raising an interrupt
		when coalescing (in microseconds).

   b. Run-time Tunable Parameters:
   ------------------------------
        On Solaris: 
 	    The per-project memory limits are quite low (32MB by default). 
	    When an application needs more memory to be locked, the 
	    application may fail with a "pin failure case not implemented" 
	    message. 

	    The per-project memory limits can be raised by using prctl:.
		prctl -n project.max-device-locked-memory -v 300MB -r -i project 3

	    Project numbers are defined in /etc/project and they are:
		system:0
		user.root:1
		noproject:2
		default:3
		group.staff:10

   c. Modifying MX Library Behavior at Run-time:
   --------------------------------------------

   The MX library behavior and functionality can be modified with run-time 
   environments variables, and/or configure time options. The defaults are
   intended to work well with all applications, so this documentation is 
   meant for advanced users.

   These run-time environment variables are categorized as follows:

  *  Registration cache:

     MX_RCACHE[0|1] (default=0)

     The MX_RCACHE environment variable is described in the FAQ entry
     "How do I obtain the maximum bandwidth performance with MPICH-MX?
     (http://www.myri.com/cgi-bin/fom?file=463).

   * Communication channels:

     There are 3 communication channels used by the MX library:

     * the network channel where messages are going through the NIC and the
       network
     * the shared-memory channel where message are exchanged between processes
       on the same machine by use of shared-memory and special system calls 
       implemented by the MX driver.
     * the self channel for intra-process message implemented completely 
       internally to the process.

     The use of these channels is regulated by two environment variables:

     MX_DISABLE_SHMEM[0|1] (default=0)

     Setting this variable to 1 will disable the shared-memory channel.
     Communication for endpoints on the same machine will always go through
     the network.

     MX_DISABLE_SELF[0|1] (default=0)

     Setting this variable to 1 will disable the self-communication channel. 
     In a typical usage you will also disable MX_DISABLE_SHMEM, and even 
     intra-process messages will go through the network.  Or, if only 
     MX_DISABLE_SELF is set and not MX_DISABLE_SHMEM, the shared-memory 
     channel implementation will be used for intra-process communications.

     The MX_DISABLE_SHMEM and MX_DISABLE_SELF should be used consistently 
     among different MX processes within the same job.

     See also "How is software loopback implemented in MX? 
     (http://www.myri.com/cgi-bin/fom?file=450).

   * Statistics:

     MX_STATS=[0|1] (default=0):

     Setting this variable to 1 will enable the reporting of statistics
     about various events in the library on a per-endpoint basis.
     Statistics are displayed when the endpoint is closed.

3. Enabling IP connectivity (OPTIONAL):
--------------------------------------

	* Linux and FreeBSD:
	        If you wish to enable IP connectivity (Ethernet emulation
                in Myrinet mode, or Ethernet driver for Ethernet mode),
	        the command is as follows:

		/sbin/ifconfig myri0 <ip_address> up

		where you must replace myri0 with the appropriate name
		(myri1, myri2, etc.) if you have more than one Myrinet
		NIC per host.

        * Solaris:

                If you wish to run IP over Myrinet (ethernet emulation),
                the command to enable IP over MX is as follows:

                ifconfig myri0 plumb <ip_address> up

                where you must replace myri0 with the appropriate name
                (myri1, myri2, etc.) if you have more than one Myrinet
                NIC per host.

                Note for Solaris 10GA: Due to a bug, Solaris 10GA does not 
                support 9000 byte (jumbo) frames as released. In order to
                obtain full ethernet performance, jumbo frames are critical,
                so we encourage you to apply Sun's patch 119832-01 for
                UltraSPARC or patch 119833-01 for AMD64. For patch access,
                refer to the SUnSolve Patch Access web page at
                <http://au.sunsolve.sun.com/pub-cgi/show.pl?target=patches/patch-access>.
                If you do not apply this patch, you will need to set the
                Ethernet MTU to 1500 bytes in /kernel/drv/myri.conf by
                specifying mx_mtu_override=1500 when loading MX. No jumbo
                frames will be allocated until after the myri0 interface
                is plumbed.

	* MacOSX:

		You should configure the MX ethernet emulation
		interface as you would any other ethernet interface.
		On most systems, the MX ethernet adaptor will appear as
		"en1".  It is possible, if you have additional network
		cards, that the adaptor will appear as "en2", "en3",
		etc.

		To verify which ethernet adaptor belongs to MX, you
		may need to run the "Network Utility"
		(/Applications/Utilities/Network Utility). Click the
		"Info" tab and select each "Network Interface" from
		the menu until you find the one whose Vendor is
		Myricom, and whose Hardware Address matches the MAC
		address printed by "mx_info".
		
		Once you have found the correct adaptor, configure it via:
		
		System Preferences -> Network -> Show -> Ethernet Adaptor (enX)


================================================
III. MX Tool/Utility Functions and Test Programs
================================================

A variety of MX Tool Programs

  mx_counters
  mx_dmabench
  mx_endpoint_info
  mx_hostname
  mx_info

are available in the <install_path>/bin/ directory.  Test programs are
also available in the <install_path>/bin/tests directory.

Refer to the <install_path>/bin/README for details.

==================
IV. MX Performance
==================

The MX Benchmark Programs 

  mx_pingpong
  mx_pingpong_unex
  mx_stream

are located in the <install_path>/bin/tests/ directory.

Directions for running these benchmark programs can be found in the 
<install_path>/bin/README.

===========
V.  Caveats
===========

a. Write-combining on i386 and x86_64 hosts

   For optimal performance of MX on i386 and x86_64 hosts, write-combining 
   must be enabled on the PCI chipset of the host. MX will enable 
   write-combining on ia32 and x86_64 hosts when there are no conflicting 
   attribute regions for physical or PCI memory pre-existing at driver load
   time.  If MX is unable to enable write-combining at load time, an error 
   message like

mtrr: type mismatch for fd000000,1000000 old: uncachable new: write-combining

   will appear in the kernel log.

   Refer to the Myrinet FAQ entry "When I load MX-2G, I see an error message
   about "mtrr: type mismatch ... write-combining".  What does this error
   message mean?" (http://www.myri.com/cgi-bin/fom?file=416).
